The biological knowledge discovery by PCCF measure and PCA-F projection
نویسندگان
چکیده
In the process of biological knowledge discovery, PCA is commonly used to complement the clustering analysis, but PCA typically gives the poor visualizations for most gene expression data sets. Here, we propose a PCCF measure, and use PCA-F to display clusters of PCCF, where PCCF and PCA-F are modeled from the modified cumulative probabilities of genes. From the analysis of simulated and experimental data sets, we demonstrate that PCCF is more appropriate and reliable for analyzing gene expression data compared to other commonly used distances or similarity measures, and PCA-F is a good visualization technique for identifying clusters of PCCF, where we aim at such data sets that the expression values of genes are collected at different time points.
منابع مشابه
خوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملThe overall efficiency and projection point in network DEA
Data Envelopment Analysis (DEA) is one of the best methods for measuring the efficiency and productivity of Decision Making Units (DMU). Evaluating the efficiency of DMUs which have two or several stages by using the conventional DEA models, is equal to consider them as black box. This method, omits the effect of intermediate measure on efficiency. Therefore, just the first network inputs and t...
متن کاملImproving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering
Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...
متن کاملThe effect of knowledge based economic indicators on the countries' economic complexity
Countries’ economic growth and development are significantly dependent on their productive capacity. In this research, we aimed to investigate which components of a knowledge-based economy has a more meaningful role in the production capacity. In order to measure production capacity, we used one of the most up-to-date indexes, the economic complexity index. The research used data panel consist...
متن کاملComparison of MLP NN Approach with PCA and ICA for Extraction of Hidden Regulatory Signals in Biological Networks
The biologists now face with the masses of high dimensional datasets generated from various high-throughput technologies, which are outputs of complex inter-connected biological networks at different levels driven by a number of hidden regulatory signals. So far, many computational and statistical methods such as PCA and ICA have been employed for computing low-dimensional or hidden represe...
متن کامل